This vignette outlines the steps of Integration, clustering, differential analysis, functional annotation and visualization of intracellular pathogens at single-cell level using PathogenTrackExplorer. PathogenTrackExplorer requires Gene Expression Matrix (produced by cellranger) and Pathogen Abundance Matrix (produced by PathogenTrack) as the user input and explores intracellular pathogens by integration, clustering, differential analysis, functional annotation and visualization.
Before start this vignette, users should download the demo data and uncompress the .gz file under the following commands in bash environment.
# Please run in bash
git clone git@github.com:rstatistics/scNSCLC_matrices.git
cd scNSCLC_matrices/
gzip -d *.gz
library(Seurat)
library(Matrix)
library(dplyr)
library(ggplot2)
library(reshape2)
library(cowplot)
library(harmony)
library(ggthemes)
library(ggrepel)
library(topGO)
library(org.Hs.eg.db)
library(PathogenTrackExplorer)
PathogenTrackExplorer requires Gene Expression Matrix (produced by cellranger) and Pathogen Abundance Matrix (produced by PathogenTrack). PathogenTrackExplorer integrated these matrix automatically, and build an integrated Seurat object. This step may take several hours to complete.
SeuratObject <- vIntegrate(x="./scNSCLC_matrices/data/SampleInfo.txt", min.nFeature=500, max.nFeature=4000, percent.mito=10, min.nCount_RNA=1000, max.nCount_RNA = Inf, nVariable=2000, nPC=30, res=0.5)
SeuratObject <- subset(SeuratObject, group=="NSCLC")
saveRDS(SeuratObject, file = "./NSCLC_SeuratObject.rds")
SeuratObject <- readRDS("./NSCLC_SeuratObject.rds")
We use Seurat::FindAllMarkers to find cluster biomarkers. Only ‘MAST’ test method is used in PathogenTrackExplorer.
SeuratObject@misc$DE <- Seurat::FindAllMarkers(object = SeuratObject, assay = 'RNA', only.pos = TRUE, test.use = 'MAST')
We employ topGO to perform enrichment analysis.
SeuratObject@misc$GO <- xGO(object = SeuratObject, key = "DE", clusters = NULL)
We use Seurat::DimPlot to show cells and clusters in UMAP graph. Each dot represents a cell, and each color represents the identity class.
DimPlot(SeuratObject, cols = SeuratObject@misc$cols)
xPopulationPlot function is built to visualize cell population fraction in samples or clusters.
xPopulationPlot(SeuratObject, by="sample", cols=SeuratObject@misc$cols)
The xDotPlot is a wrapper for Seurat::DotPlot. The size of the dot stands for the percentage of cells expressing the feature in each cluster. The color represents the average expression level. Here we use it to show the top 3 high expressed features in each cluster.
SeuratObject@misc$DE %>% group_by(cluster) %>% top_n(n = 3, wt = avg_logFC) -> top3
xDotPlot(SeuratObject, features = top3$gene)
The xFeaturePlot is a wrapper for Seurat::FeaturePlot, and it accepts one or more features to draw a patchworked figure automatically.
markers = c('CD3D', 'MS4A1', 'SDC1', 'MZB1', 'TPPP3', 'KRT18', 'CD68', 'CD163', 'FCN1', 'FCGR3B', 'HBA1', 'KLRD1', 'FCER1A', 'CD1C', 'CLEC9A', 'LILRA4', 'TPSB2', 'CD34', 'FUT4', 'LYZ')
xFeaturePlot(object=SeuratObject, features=markers, font.size = 8)
#png(file="SeuratObject_markers_plot.png", width = dpi*12, height = dpi*8, units = "px",res = dpi,type='cairo')
#xFeaturePlot(object=SeuratObject, features=markers, font.size = 8)
#dev.off()
Here, we use xGOBarPlot to show GO enchriment of specific cluster.
xGOBarPlot(SeuratObject, key = "GO", cluster = "9", ont = "BP", top_n = 10)
xGOBoxPlot is used to show GO enrichment of all clusters.
xGOBoxPlot(SeuratObject, key = "GO", ont = "BP", top_n = 3)
Also, we can use xGODotPlot to show GO enrichment of all clusters.
xGODotPlot(SeuratObject, key = "GO", ont = "BP", top_n = 3)
Top 25 abundant pathogens are used to show in xFeaturePlot.
species <- c("Pseudomonas tolaasii", "Xanthomonas euvesicatoria pv. alfalfae CFBP 3836", "Streptomyces sp. GY16", "Clostridium botulinum", "Wenyingzhuangia fucanilytica", "Bacillus megaterium", "Klebsiella oxytoca", "Microcystis aeruginosa FD4", "Porphyrobacter neustonensis", "Acinetobacter sp. WCHAc010034", "Hathewaya histolytica", "Candidatus Desulforudis audaxviator MP104C", "Pasteurella multocida subsp. multocida", "Mycoplasma wenyonii str. Massachusetts", "Halanaerobium hydrogeniformans", "Pseudomonas sp. R32", "Staphylococcus aureus", "Oceanisphaera profunda", "Flavobacterium alkalisoli", "Staphylococcus epidermidis", "Nostoc sp. NIES-4103", "Escherichia coli", "Labilibaculum antarcticum", "Anaerohalosphaera lusitana", "Bradymonadales bacterium V1718", "Mucilaginibacter sp. HYN0043", "Streptomyces collinus Tu 365", "Rahnella sp. ERMR1:05", "Intrasporangium calvum DSM 43043", "Treponema brennaborense DSM 12168", "Helicobacter bilis", "Parabacteroides distasonis ATCC 8503", "Leptotrichia shahii", "Staphylococcus hominis", "Gemella sanguinis", "Acidihalobacter prosperus", "Pseudomonas cedrina", "Spiroplasma sp. TIUS-1", "Mycoplasma hyorhinis", "Brevibacillus laterosporus", "Pasteurella multocida", "Corynebacterium minutissimum", "Methanobrevibacter smithii", "Allochromatium vinosum DSM 180", "Actinoplanes sp. N902-109", "Halomonas sp. N3-2A", "Clostridium cellulovorans 743B", "Sphaerospermopsis kisseleviana NIES-73", "Campylobacter jejuni", "Saccharomonospora glauca K62")
species <- intersect(species, rownames(SeuratObject)) %>% head(25)
xFeaturePlot(object=SeuratObject, features=species, font.size = 8)
#png(file="SeuratObject_species_plot.png", width = dpi*16, height = dpi*12, units = "px",res = dpi,type='cairo')
#xFeaturePlot(object=SeuratObject, features=species, font.size = 8, order = TRUE)
#dev.off()
Here, the abundance of “Streptomyces sp. GY16” is shown in xFeaturePlot.
xFeaturePlot(object=SeuratObject, features=c("Streptomyces sp. GY16"), font.size = 8, order = TRUE)
The vDE is a wrapper of Seurat::FindMarkers, which is employed to do differential analysis between pathogen infected and pathogen negative cells of each cluster. Here we use vDE to find differential genes between “Streptomyces sp. GY16” positive and “Streptomyces sp. GY16” negative cells in cluster “7” and cluster “8”.
SeuratObject@misc$vDE <- vDE(SeuratObject, features.by=c("Streptomyces sp. GY16"), min.cells = 20, clusters = as.character(7:8))
## ### Analysis Cluster-7 ...
## ====== Analysis feature Streptomyces sp. GY16 ... done.
## ### Analysis Cluster-8 ...
## ====== Analysis feature Streptomyces sp. GY16 ... done.
The vHeatMap is a warpper of Seurat::DoHeatmap, which is employed to visualize pathogen infected(+) and pathogen negative(-) cells of each cluster in heatmap. The heatmap of differential genes in cluster “7” and “8” are shown.
vHeatMap(object = SeuratObject, key = "vDE", cluster = "7", feature.by = "Streptomyces sp. GY16")
vHeatMap(object = SeuratObject, key = "vDE", cluster = "8", feature.by = "Streptomyces sp. GY16")
The vVolcanoPlot enables quick visual identification of significant features with small pvalue and large fold changes. The volcano plot of differential genes in cluster “7” and “8” are shown.
vVolcanoPlot(object = SeuratObject, key = "vDE", feature.by = "Streptomyces sp. GY16", cluster = "7", top_n = NULL)
vVolcanoPlot(object = SeuratObject, key = "vDE", feature.by = "Streptomyces sp. GY16", cluster = "8", top_n = NULL)
We employ topGO to perform enrichment analysis of differentially expressed features between pathogen(+) and pathogen(-) cells of each cluster. Here we perform go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example.
SeuratObject@misc$vGO <- vGO(SeuratObject, key = "vDE", cluster = "7", feature.by = "Streptomyces sp. GY16")
## [1] "Running Streptomyces sp. GY16 in Cluster-7 ... "
## [1] "done."
Here we use vGOBarPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example.
vGOBarPlot(SeuratObject, key="vGO", cluster="7", feature="Streptomyces sp. GY16", ont="BP", top_n=20)
Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “7” as an example. The color represents the significance, and the dot size represents for gene numbers that enriched in the corresponding pathway.
vGODotPlot(SeuratObject, key="vGO", cluster="7", feature="Streptomyces sp. GY16", ont="BP", top_n=20)
Here we perform go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example.
SeuratObject@misc$vGO <- vGO(SeuratObject, key = "vDE", cluster = "8", feature.by = "Streptomyces sp. GY16")
## [1] "Running Streptomyces sp. GY16 in Cluster-8 ... "
## [1] "done."
Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example.
vGOBarPlot(SeuratObject, key="vGO", cluster="8", feature="Streptomyces sp. GY16", ont="BP", top_n=20)
Then we use vGODotPlot to show the go enrichment of “Streptomyces sp. GY16” positive and negative in cluster “8” as an example. The color represents the significance, and the dot size represents for gene numbers that enriched in the corresponding pathway.
vGODotPlot(SeuratObject, key="vGO", cluster="8", feature="Streptomyces sp. GY16", ont="BP", top_n=20)
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
##
## locale:
## [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] PathogenTrackExplorer_0.1.0 org.Hs.eg.db_3.10.0
## [3] topGO_2.38.1 SparseM_1.78
## [5] GO.db_3.10.0 AnnotationDbi_1.48.0
## [7] IRanges_2.20.2 S4Vectors_0.24.4
## [9] Biobase_2.46.0 graph_1.64.0
## [11] BiocGenerics_0.32.0 ggrepel_0.8.2
## [13] ggthemes_4.2.0 harmony_1.0
## [15] Rcpp_1.0.5 cowplot_1.0.0
## [17] reshape2_1.4.4 ggplot2_3.3.2
## [19] dplyr_1.0.0 Matrix_1.2-17
## [21] Seurat_3.1.5
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_1.4-1
## [3] ellipsis_0.3.1 ggridges_0.5.2
## [5] XVector_0.26.0 GenomicRanges_1.38.0
## [7] leiden_0.3.3 listenv_0.8.0
## [9] farver_2.0.3 bit64_0.9-7
## [11] codetools_0.2-16 splines_3.6.0
## [13] knitr_1.29 jsonlite_1.7.1
## [15] ica_1.0-2 cluster_2.0.8
## [17] png_0.1-7 uwot_0.1.8
## [19] sctransform_0.2.1 compiler_3.6.0
## [21] httr_1.4.2 lazyeval_0.2.2
## [23] prettyunits_1.1.1 htmltools_0.5.0
## [25] tools_3.6.0 rsvd_1.0.3
## [27] igraph_1.2.5 gtable_0.3.0
## [29] glue_1.4.1 GenomeInfoDbData_1.2.2
## [31] RANN_2.6.1 rappdirs_0.3.1
## [33] vctrs_0.3.1 ape_5.4
## [35] nlme_3.1-139 lmtest_0.9-37
## [37] xfun_0.15 stringr_1.4.0
## [39] globals_0.12.5 lifecycle_0.2.0
## [41] irlba_2.3.3 future_1.18.0
## [43] zlibbioc_1.32.0 MASS_7.3-51.4
## [45] zoo_1.8-8 scales_1.1.1
## [47] MAST_1.12.0 hms_0.5.3
## [49] SummarizedExperiment_1.16.1 RColorBrewer_1.1-2
## [51] SingleCellExperiment_1.8.0 yaml_2.2.1
## [53] memoise_1.1.0 reticulate_1.16
## [55] pbapply_1.4-2 gridExtra_2.3
## [57] stringi_1.4.6 RSQLite_2.2.0
## [59] BiocParallel_1.20.1 GenomeInfoDb_1.22.1
## [61] rlang_0.4.7 pkgconfig_2.0.3
## [63] matrixStats_0.56.0 bitops_1.0-6
## [65] evaluate_0.14 lattice_0.20-38
## [67] ROCR_1.0-11 purrr_0.3.4
## [69] patchwork_1.0.1 htmlwidgets_1.5.1
## [71] labeling_0.3 bit_1.1-15.2
## [73] tidyselect_1.1.0 RcppAnnoy_0.0.16
## [75] plyr_1.8.6 magrittr_1.5
## [77] R6_2.4.1 generics_0.0.2
## [79] DelayedArray_0.12.3 DBI_1.1.0
## [81] pillar_1.4.6 withr_2.2.0
## [83] fitdistrplus_1.1-1 abind_1.4-5
## [85] survival_3.2-3 RCurl_1.98-1.2
## [87] tibble_3.0.3 future.apply_1.6.0
## [89] tsne_0.1-3 crayon_1.3.4
## [91] KernSmooth_2.23-15 plotly_4.9.2.1
## [93] rmarkdown_2.3 progress_1.2.2
## [95] grid_3.6.0 data.table_1.12.8
## [97] blob_1.2.1 digest_0.6.25
## [99] tidyr_1.1.0 munsell_0.5.0
## [101] viridisLite_0.3.0